Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add support for parsing subdirectories out of git URLs (redux) #10

Merged
merged 4 commits into from
Nov 14, 2017
Merged

Add support for parsing subdirectories out of git URLs (redux) #10

merged 4 commits into from
Nov 14, 2017

Conversation

camjackson
Copy link
Contributor

This is a cleaned up version of #9. See that PR for more context.

Copy link
Owner

@emk emk left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Wow, I really like this version! It feels cleaner and more elegant.

I can think of one or two minor Rust API tweaks that we should consider before merging this. These are actually pretty simple, but apparently I need a lot of words for me to justify and explain what I'm thinking, so please don't be concerned by the length of this feedback!

Once again, thank you for tackling this. It's a great feature, and I'm delighted to have it.

}

/// Extract the repository part of the URL
pub fn repository(&self) -> String {
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If possible, I'd love to change String to &str. Logically we can assume that this object owns the necessary string somehow, so we can just return a slice.

}

/// Extract the optional branch part of the git URL
pub fn branch(&self) -> Option<String> {
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

...and Option<&str>, and so on.

self.parse_parts().2
}

fn parse_parts(&self) -> (String, Option<String>, Option<String>) {
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(&str, Option<&str>, Option<&str>)

(
captures.at(1).unwrap().to_string(),
captures.at(3).map(|branch| branch.to_string()),
captures.at(5).map(|branch| branch.to_string()),
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since we're using &str everywhere, we can delete the .to_string() calls here and save a bunch of heap allocations.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh wow yeah that's way cleaner. Not sure why I was using Strings everywhere.


fn parse_parts(&self) -> (String, Option<String>, Option<String>) {
lazy_static! {
static ref URL_PARSE: Regex = Regex::new(r#"^([^#]+)(#([^:]+)?(:(.+))?)?$"#).unwrap();
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If we were being purely pendantic, it might be nice to change .unwrap() to .expect("could not parse URL_PARSE"), but that's being a bit pendantic, and much of compose_yml is old enough that I hadn't started doing that yet myself. :-)

Either expect or unwrap is fine here, because this is a true assertion violation.

Also, we might add a couple of :? "no-capture" markers as follows:

^([^#]+)(?:#([^:]+)?(?::(.+))?)?$

I think those are in the right place. :-) This would also require changing the arguments to at below.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Aha, I didn't even know about no-capture, that's awesome.

lazy_static! {
static ref URL_PARSE: Regex = Regex::new(r#"^([^#]+)(#([^:]+)?(:(.+))?)?$"#).unwrap();
}
let captures = URL_PARSE.captures(&self.url).unwrap();
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This unwrap, on the other hand, looks like a possible runtime panic, because you're matching a complex regex against a user-provided string. And unwrap/expect are really meant to be assertions, not error-checks—except maybe in very short "scripts".

So we need to do one of two things here:

  1. Provide an iron-clad argument that captures can never fail on any possible input, or
  2. Redesign this class / API so that we can only construct a GitUrl with a valid URL.

The later might look like:

pub struct GitUrl {
   url: String, // Must be private to maintain invariants.
} 

impl GitUrl {
   fn new<S: Into<String>>(url: S) -> Result<GitUrl> {
       let url = GitUrl { url: url.into() };
       // Make sure our URL can always be parsed.
       url.parse_parts()?;
       Ok(url)
   }
}

Then later you can write:

fn parse_parts(&self) -> Result<(&str, Option<&str>, Option<&str>)> { ... }

pub fn subdirectory(&self) -> Option<&str> {
    let (_, _, subdir) = self.parse_parts()
        .expect("parse_parts failed on data that we already parsed once successfully");
    subdir
}

The nice thing about this is that it allows us to add other validity checks to new in the future as needed. Alternatively, we could keep the current API, but provide a comment explaining why parse_parts can never fail on any possible input.

Copy link
Contributor Author

@camjackson camjackson Nov 12, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah I was conscious of the unwrap here. I'm pretty certain that it can't blow up though. As far as I can tell, the only way captures can fail is if the url is empty, or if it starts with a hash. And either of those cases would fail the the existing should_treat_as_url check.

Now that said, I can totally see a situation where someone might change the should_treat_as_url regex, without realising that there's an unwrap 70 lines down that depends on it.

Conclusion: I'll go with your suggestion and make new call parse_parts, so it definitely can't panic.

Copy link
Contributor Author

@camjackson camjackson Nov 12, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just looking at your proposed snippet above, you've removed the should_treat_as_url check from new. I feel like we should keep that check there, seeing as it ensures Docker will accept the URL. The parse_parts regex will accept strings like "a#b:c", or even just "a", which is obviously not a URL.

Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh, yeah, definitely include whatever is already there in new. I wrote that new example from scratch.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah, ok, great.

Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Now that said, I can totally see a situation where someone might change the should_treat_as_url regex, without realising that there's an unwrap 70 lines down that depends on it.

Yeah, a good rule of thumb is that you shouldn't call unwrap in a doubtful situation like this. At a minimum, there should be an expect and a explanation of why this really is an assertion, and not a run-time error condition.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok I'm really close with this, just struggling with the final piece. Right now I have this:

    fn parse_parts(&self) -> Result<(&str, Option<&str>, Option<&str>)> {
        lazy_static! {
            static ref URL_PARSE: Regex = Regex::new(r#"^([^#]+)(?:#([^:]+)?(?::(.+))?)?$"#)
                .expect("Could not parse regex URL_PARSE");
        }
        let captures = URL_PARSE.captures(&self.url).unwrap();
        Ok((
            captures.at(1).unwrap(),
            captures.at(2),
            captures.at(3),
        ))
    }

I've wrapped the return type in Result, and I'm handling the Result everywhere that calls this, but as you can see I still have the unwrap. I tried to replace unwrap() withok_or(())? to convert the Option into a Result but couldn't get the types to work. Is it because the Result here is an error-chain thing, not a regular Result?

Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, you need to construct our local Error type here, which you can actually do with a string and .into(). Use ok_or_else(|| -> Error { format!(...).into() }) and it will construct an error for you.

assert_eq!(with_ref.branch(), Some("mybranch".to_string()));
assert_eq!(with_ref.subdirectory(), None);

let with_subdir = GitUrl::new(format!("{}{}", url, "#:myfolder")).unwrap();
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is great, but let's also add a test case for format!("{}{}", url, ":myfolder")), specifying how that should be parsed.

Copy link
Contributor Author

@camjackson camjackson Nov 13, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looking at the Docker docs, I don't think that's a valid URL? Also, the parsing as it's currently written relies on the hash as a nice, easy-to-spot delimeter. A lone colon is harder to split on, because it could be for a port number. 🤔

Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh, sorry! I should have been more clear. I want to make sure format!("{}{}", url, ":myfolder")) isn't parsed as a subdir. It's a suspicious input, where the implementation might plausibly fail, and I want to make sure that we interpret it correctly.

Does this make sense?


/// Returns a new Context which is the same as the
/// this one, but without any subdirectory part
pub fn without_subdirectory(&self) -> Context {
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is an amazingly good choice of API. Thank you for figuring this out. Maybe it might be clearer if we named it without_repository_subdirectory?

- Also clean up some String allocations and tweak some naming
@camjackson
Copy link
Contributor Author

camjackson commented Nov 13, 2017

Ok I think I've addressed now except for your last point on testing, which I replied to above. Let me know what you think 🙂

Copy link
Owner

@emk emk left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This code is looking really polished.

Two very minor nitpicks, and then I'll merge it. :-)

@@ -42,7 +42,9 @@ impl GitUrl {
pub fn new<S: Into<String>>(url: S) -> Result<GitUrl> {
let url = url.into();
if GitUrl::should_treat_as_url(&url) {
Ok(GitUrl { url: url })
let git_url = GitUrl { url };
git_url.parse_parts()?;
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's add a comment explaining the invariant we're trying to enforce (otherwise this might be a bit mysterious):

// Invariant: `parse_parts` should succeed for all `GitUrl` values after construction.

static ref URL_PARSE: Regex = Regex::new(r#"^([^#]+)(?:#([^:]+)?(?::(.+))?)?$"#)
.expect("Could not parse regex URL_PARSE");
}
let captures = URL_PARSE.captures(&self.url).ok_or_else(|| -> Error { "".into() })?;
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's make that error: format!("could not parse URL {:?}", self.url).into().

@camjackson
Copy link
Contributor Author

I'm on the edge of my seat over here! 😄

@emk emk merged commit 5fb8af6 into emk:master Nov 14, 2017
@camjackson camjackson deleted the subdir_clean branch November 14, 2017 14:32
@emk
Copy link
Owner

emk commented Nov 14, 2017

It's great, and it's merged! This should be released as 0.0.50 in a couple of minutes.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants